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ABSTRACT Genonne-wide association studies of connplex traits often are complicated by relatedness 
among individuals. Ignoring or inappropriately accounting for relatedness often results in inflated type I 
error rates. Either genotype or pedigree data can be used to estimate relatedness for use in mixed-models 
when undertaking quantitative trait locus mapping. We performed simulations to investigate methods for 
controlling type I error and optimizing power considering both full and partial pedigrees and, similarly, both 
sparse and dense marker coverage; we also examined real data sets. (1) When marker density was low, 
estimating relatedness by genotype data alone failed to control the type I error rate; (2) this was resolved by 
combining both genotype and pedigree data. (3) When sufficiently dense marker data were used to 
estimate relatedness, type I error was well controlled and power increased; however, (4) this was only true 
when the relatedness was estimated using genotype data that excluded genotypes on the chromosome 
currently being scanned for a quantitative trait locus. 
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In their simplest form, genome-wide association studies (GWAS) 
assume that all subjects are unrelated. However, human population 
isolates and various model organism populations contain individuals 
with varying levels of relatedness. For polygenic traits, this results in 
correlations among both genotypes and phenotypes and can produce 
inflated type I error rates when performing GWAS (Newman et al. 
2001; Cheng et al. 2010). Mixed models are commonly used to ac- 
count for relatedness using a random effect and may optionally model 
the effect of individual markers as a fixed effect (Goldgar 1990; Amos 
1994; Xu and Atchley 1995; Abney et al 2000; Yu et al 2006; Kang 
et al 2008; Cheng et al 2010). Relatedness can be estimated from 
a pedigree or from genotype data. The use of genotype {e.g., Yu et al 
2006; Kang et al 2008) or pedigree (Abney et al 2002; Cheng et al 
2010) data for GWAS has been implemented previously. However, 
when both types of data are available, methods to control the type I 
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error rate while maximizing power have not been systematically 
explored. 

Although siblings share an average of 50% of their genome 
identity-by- descent (IBD), the realized sharing is variable. Genotype 
data allow estimation of realized sharing (Ritland 1996; Lynch and 
Ritland 1999; Wang 2002; Frentiu et al 2008), as opposed to the 
average level of sharing that is obtained from pedigree data. However, 
genotypes only provide information about identity-by-state (IBS), 
which is only an approximation to IBD. Furthermore, the accuracy 
of estimates of realized sharing depends on the density of genotype 
data. When both pedigree and genotype data are available, a very 
pragmatic question arises: how should these data be used to control 
false-positive rates while increasing power? 

In this study, we used simulations to address this question. We 
estimated relatedness by using genotype data, pedigree data, and the 
combination of both genotype and pedigree data under various 
models. We sought methods that could control the type I error rate 
and maximize power. 

METHODS, SIMULATIONS, AND RESULTS 
Statistical models 

Our methods are based on the linear mixed model for quantitative 
traits with a single major diallelic quantitative trait locus (QTL) 
modeled as a fixed effect and P polygenes modeled as random effects. 



;-2iG3*Genes | Genomes | Genetics 



Volumes I October 2013 I 1861 



p 

y = ^ + x<2 + Zii + u/ + e, (1) 

1=1 

where y is the vector of trait values; jul is the vector of trait means 
that may depend on known covariates; x is a vector of genotypes 
with values —1, 0, and 1 corresponding to genotypes AA, AB, and 
BB of a QTL; a is the additive effect of the QTL; z is a vector whose 
elements take on value 1 when the subject has QTL genotype AB and 
value 0 otherwise; d is the dominance effect of the QTL; is the 
genetic effect at the ith polygenic locus; and e is the vector of re- 
sidual effects. We assume the random effects to be distributed nor- 
mally, e ~ iV(0, Icrl), where I is the identity matrix; ~ N(0, fl/), 
with the polygenic effects independent of each other and inde- 
pendent of the residual effect e. We model the polygenic covariances 
as Hi = 24>/or^ • + A/(T^ ., where the (;, A;)th element of ^/ is the 
probability that at polygene / a randomly drawn allele from subject; 
and a randomly drawn allele from subject k are IBD, the (;, k)th 
element of A/ is the probability that at polygene / the two alleles in 
subject j are both IBD with the two alleles in subject k and that 
neither subject is autozygous, and o"^- and o"^- are the additive 
and dominance polygenic variances at locus /. In general, when in- 
breeding is present, there are additional variance components pres- 
ent (Gillois 1964; Harris 1964). The additional variance components, 
however, typically are small (Lynch and Ritland 1999; Abney et al 
2000) and we choose to ignore them here. The total covariance 
matrix for the polygenic effect, then, is 

i=l i=l i=l 

Our objective is, given the genotype data at a marker, to 
test whether the marker is a QTL. That is, we test the null hy- 
pothesis Hq: a = 0 and d = 0 vs. the alternative Hi: a ^ 0 or 
d 0. We use the likelihood ratio test with the trait model 
y ~ N(iui + xfl + zd, ft + Io"g), where we must use an estimator ft 
for the true covariance matrix in equation (2) because the true 
relationship matrices and and polygenic variances al • and 
are unknown. Typically, and A^ are estimated by their 
expected value given a pedigree, £(4>/) = and £(A/) = Ap for 
all loci /, where Op and Ap are the pedigree based estimates. How- 
ever, it is also possible to estimate these quantities from the 
marker data, and when the marker data are informative enough, 
this may more accurately estimate the true sharing at the poly- 
genic loci. We label the marker-based estimates and A^ and 
are described in Relationship matrices subsection. This leads us to 
consider three possible models for the polygenic covariance in the 
likelihood ratio test, VMl: ilp = l^p&j^ + ^p^p^ where the re- 
lationship matrices are estimated using only pedigree information, 
VM2: ilm = 2<fc^6"^ ^ + ^m^m d ^^ere the relationship matrices 
are estimated using only observed genotype data, and VM3: 
^rnp = 2<i>mO-^,a + ^^P^p,a + ^^0"^,^ + K^l,d ^^ere relationship 
matrices are estimated from both genotype and pedigree data are 
used. In all three variance models the variance parameters are 
estimated by maximum likelihood. 

Relationship matrices 

We obtained relationship matrices as described and implemented in 
the R package "QTLRel" (Cheng et al 2011). The pedigree estimates 
are based on Karigl's algorithms (Karigl 1981). To obtain the marker 



based estimates and A^y we considered each genotyped locus for 
a pair of subjects and used an estimator based on IBS rather than IBD. 
For a diallelic marker / the (j, /c)th element of ^rn,i takes on value 1.0 
when subjects j and k are both homozygous for the same allele, 0.5 
when one is homozygous and the other heterozygous or both are 
heterozygous, or 0 when both are homozygous for different alleles. 
We define the (j, A;)th element of \rn,i as 1.0 when both j and k are 
heterozygous and 0 otherwise. Our marker based estimates are the 
mean across L markers used in the estimator, = iX^Li^^,/ and 
\rn = iX^Li Am,/- In Table 1 we consider different sets of the L loci in 
our estimators. Note that under the assumption that all the additive 
polygenic variances are equal and all the dominance polygenic 
variances are equal, the true polygenic covariance matrix given 
in Equation (2) would closely resemble our estimated covariance matrix 
given in VM2, with the summation over polygenes replaced by 
a summation over markers. Although we do not require this assump- 
tion to use our estimators for the relationship matrices, it does suggest 
that a more efficient estimator might be chosen by appropriately 
weighting each term in the summations for and A^y with the 
optimal weights depending upon both how IBS at a marker captures 
IBD at a polygene and the relative magnitude of the variance at that 
polygene. We do not explore this issue here. 

Mapping populations 

We considered two mapping populations: an advanced intercross line 
(AIL) F26 and a structured population (STR). For the AIL, we assumed 
that one male and one female offspring from each of 144 F„ (2 < n < 
25) breeding pairs was randomly mated with a nonsibling to produce 
the next generation. The final sample used for mapping consisted of 
four offspring from each of 144 F25 breeding pairs for a sample size of 
576. The STR consisted of subsamples from three subpopulations. The 
first subsample was from an AIL F26,where one male and one female 
progeny from each of 48 F„ (2 < n < 25) breeding pairs was ran- 
domly mated with nonsiblings to produce the next generation and 
four offspring of each F25 breeding pair contributed to the subsample. 
The other two subsamples were generated as follows. A male and 
a female progeny from each of 96 F„ (2 < n < 12) breeding pairs 
were randomly mated with nonsiblings to produce the next genera- 
tion. The Fi3 breeding pairs were randomly split into two subpopu- 
lations of equal size and bred until F26 with the same breeding scheme 
as above within each subpopulation. The STR sample size was also 
576. These pedigrees were created once and were used in replicate 
simulations. 

Sparse marker simulations 

We simulated 15 chromosomes that were 400 cM each; each chromo- 
some had 101 evenly spaced markers (4 cM spacing). A total of 500 
polygenic QTL were evenly spaced on the first five chromosomes. 
We simulated two possible relationships between the markers and 
the polygenic QTL (Table 1): (Completely linked), that is, all of the 
polygenic QTL were exactly at marker loci, meaning that polygenic 
QTL were completely linked to markers or (Incompletely linked), 
that is, each polygenic QTL was midway between two adjacent markers, 
meaning that polygenic QTL were incompletely linked to markers. 
On chromosomes 1-5, the additive and dominance effects of a poly- 
genic QTL were generated randomly from uniform distributions 
U (-0.15, 0.15) and U (-0.08, 0.08), respectively, in each replicate 
simulation. The residual effect was simulated from a normal distri- 
bution N (0, 1). The polygenic QTL approximately accounted for 
84% of the total variation. Heritabilities in this range are not uncom- 
monly observed in model organisms and humans (e.g., Yang et al 
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W Table 1 Names of marker sets used to estimate genotype-based relationship matrices 





Chromosomes Used in Marker Set 


Marker Set Name When Polygenes and Markers Are 
Completely Linked Incompletely Linked 


1- 

6 
1- 
1- 


-5 

-10 

-10 

-10 + (11-15 choose one)^ 


CL (complete linkage only) IL (incomplete linkage only) 
UL (unlinked only) 

CUL (Both CL and UL) lUL (Both IL and UL) 
CUL + 1 — 



QTL scans were only performed on chromosomes 11-15 with the following marker sets used to estimate relatedness. QTL, quantitative trait locus. 

Estimates of relatedness included chromosomes 1-10 and an additional chromosome selected from 11-15 such that the chromosome selected is the one being 
scanned, as described in the text. 



2013). We expect our results will apply across a broad range of her- 
itabilities. There were no QTL on chromosomes 6-10. We scanned 
chromosomes 11-15 for putative QTL. When we evaluated type 1 
error rates, there were no QTL on chromosomes 11-15. When eval- 
uating statistical power, there was one QTL at the position of the 
marker in the middle of the 1 1th chromosome with an additive effect 
0.5 and a dominance effect 0.2, which accounted for approximately 
2.5% of the total variation. 

Variance model estimators 

For each variance model, VMl, VM2, and VM3, we considered 
different estimators that varied in their level of informativeness. For 
VMl, we obtained estimates of the relationship matrices as follows: 
(1) using no pedigree (Naive), equivalent to assuming aU subjects are 
independent; (2) using only the final three generations {i.e., individ- 
uals, parents and grandparents) of the pedigree (Last3); (3) using only 
the final six generations of the pedigree (Last6); and (4) using the 
entire pedigree (AllPed). 

VM2 consists of estimates based on different subsets of genotype 
data. Our intent was to investigate scenarios in which the markers 
were more or less informative regarding the polygenes. An ideal case 
is when we consider only those markers that are completely linked to 
the polygenes (lefi; column of Table 1). A less-ideal case is when we 
only consider those markers that are incompletely linked to the poly- 
genes (right column of Table 1). The first row of Table 1 considers 
chromosomes 1-5, which contain all of the polygenes. The cells in this 
column are labeled complete linkage (CL) and incomplete linkage 
(IL). The second row of Table 1 considers chromosomes 6-10, which 
do not contain any polygenes, in this case both columns are equivalent 
and labeled unlinked (UL). The third row of Table 1 considers chro- 
mosomes 1-10, thus representing the combination of the prior two 
rows. These are a combination of completely linked and unlinked 
(CUL), and incompletely linked and unlinked (lUL). The final row 
includes CUL plus one of chromosomes 11-15, such that the addi- 
tional chromosome included in the estimate of relatedness is the one 
being scanned for the QTL. 

For the third variance model, VM3, we combined estimators from 
both VMl and VM2. Specifically, we used the lUL set of markers 
to estimate and and either AllPed or Last3 to estimate 4>p 
and Ap. 

We evaluated the performance, in terms of type I error rates and 
power, of the different methods for estimating the relationship 
matrices. Although chromosomes 1-10 were sometimes used to esti- 
mate the polygenic variation, only chromosomes 11-15 were scanned 
for the presence of a QTL. In simulations in which there was not 
a QTL on chromosome 11, any significant association was considered 
a false positive. When a QTL was present on chromosome 11, any 
significant association on this chromosome was considered a true 
positive. In both instances significant associations were defined as 



those exceeding a 0.05 significance level based on 5,000 permutations 
(Cheng and Palmer 2012). We obtained a similar result from 5000 
parametric bootstrap simulations. We performed 2500 replicates to 
evaluate type I error rates and power. The maximum likelihood ratio 
at each marker was used as a test statistic, as implemented in QTLRel 
(Cheng et al. 2011). 

Dense markers simulations 

After completing the prior set of simulations, we were concerned that 
certain VM2 conditions failed to adequately control type I error rates. 
We hypothesized that this was attributable to the sparse nature of the 
markers, so we conducted simulations in which we varied the density 
of markers for model VM2. In this set of simulations we only 
considered the STR and simulated 15 chromosomes of length 200 cM 
with 500 polygenic QTL that were placed randomly across the first 
10 chromosomes. The additive and dominance effects of the poly- 
genic QTL were randomly generated using the same distributions 
as described previously, whereas the residual error was simulated 
from a N (0, 0.8^) distribution. In the simulations designed to evaluate 
power, we placed a QTL with an additive effect of 0.5 and a dominance 
effect of 0.2 at position 102| cM of the 11th chromosome; otherwise, 
there were no QTL on the last five chromosomes. Markers were 
spaced evenly on the first 10 chromosomes with intermarker distances 
of 4, 2, 1, 0.5, 0.25, 0.125, or 0.0625 cM. For chromosomes 11-15 we 
considered two cases (a) markers were placed with the same density as 
on the first 10 chromosomes, or (b) markers were evenly spaced every 
2 cM. As the marker density increased in case (a) the distance between 
the QTL and its closest marker on chromosome 11 decreased; how- 
ever, unlike in the sparse marker case, the QTL was never at a marker. 
In these sets of simulations, for the VMl estimator we used the entire 
pedigree; for the VM2 estimator we estimated relatedness using the 
markers on the first 10 chromosomes. For the VM3 estimator we 
combined the VMl and VM2 estimators, where we used the last three 
generations of the pedigree for VMl. Again, chromosomes 11-15 
were scanned for a QTL and type I error rates and statistical power 
were estimated from 1000 replicates. 

Real dataset 

Finally, we used a published dataset from the 8th generation of 
a mouse AIL, which were bred from two inbred strains. This dataset 
consisted of 552 mice genotyped at 895 single-nucleotide polymor- 
phisms (SNPs) and phenotyped for a quantitative trait, as described in 
Parker et al. (2011). A fuU pedigree back to the inbred founders was 
available. In our analyses we included both additive and dominance 
variance components in the model when estimating relatedness from 
the marker data or from the pedigree. All simulation code (Supporting 
Information, File SI) and the analyzed data set (File S2) are available at 
http://palmerlab. org/ data/. 
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RESULTS 



■ Table 2 Marker set power and error rates 



Sparse markers 

Results of the type I error simulations are shown in Table 2. It is clear 
that ignoring the relatedness of the subjects led to a highly inflated 
false-positive rate (Table 2; Naive). In the AIL population, only the 
final three generations were needed to obtain sufficiently accurate 
estimates of the relationship matrices to control type I error (Last3). 
With the more STR, the last 12 generations were required to control 
the type I error rate. Although our simulations indicate that more than 
one generation typically wiU be required to control type I error rates, 
a full pedigree is not always needed. In general, it is prudent to use all 
available pedigree information, because the number of generations 
needed to control the type I error rate may not be known and using 
too many generations had no negative impact on power. 

In an ideal situation the markers used to estimate the relationship 
matrices would exactly tag the polygenic loci (CL). As shown in Table 
2, under this condition we obtained the correct type I error rate and had 
the greatest power. A less optimal situation was that the markers were 
only in partial linkage disequilibrium (LD) with the polygenes (IL), or 
even worse, the markers were completely unlinked to the polygenes 
(UL); in both of these cases, type I error rates were inflated. When 
additional uninformative markers (markers on chromosomes 6-10) 
were added to the CL and IL cases (CUL and lUL), the type I error 
rate was unaffected, however power in the CUL case was lower than the 
power in the CL case. Finally, unlike the previous cases in which in- 
formation about relatedness was drawn from markers on chromosomes 
1-10 but the QTL scan was performed on chromosomes 11-15, we 
considered the case (CUL + 1) where markers on chromosomes 11-15 
were used both to estimate the relatedness and for the QTL scan. To 
make the results directly comparable to CUL, only markers on the 
chromosome being scanned were added to those on chromosomes 
1-10 {e.g.y when scanning chromosome 11 markers on chromosomes 
1-11 were used to estimate relatedness). In this case, we found that the 
type I error rate was too conservative resulting in dramatically de- 
creased power. We attribute this phenomenon to the effect of the 
QTL being partially captured by markers that are included in the poly- 
genic term. Thus, the effect of the QTL is divided between the fixed and 
random term in the linear mixed model. This phenomenon has re- 
cently been referred to as "proximal contamination" by Listgarten et al. 
(2012). This finding suggests that markers linked to the locus being 
scanned should not be included in the estimate of relatedness. 

Finally, we considered using both pedigree and marker informa- 
tion to model relatedness (VM3). We found that although the use of 
markers or using partial pedigrees was unable to control type I error, 
the combination of the two effectively controlled the type I error rate. 
This approach may result in increased power relative to use of the 
pedigree alone, but this difference, although suggestive, was not statistically 
significant in our simulations. 

Dense markers 

As shown in Table 2, when markers were incompletely linked to the 
polygenic QTL, the type I error rate was not adequately controlled. 
This incomplete linkage was a consequence of inadequate marker 
density; therefore, we explored the effect of increasing the marker 
density. As shown in Figure 1 when only SNPs were used to estimate 
relatedness (/.e., VM2) and when the marker density was inadequate, 
the type I error rate was inflated. Using both marker and partial or fuU 
pedigree data (/.e., VM3) prevented inflation of the type I error rate, 
without sacrificing much power. This approach provides better power 
than using the pedigree alone {i.e., VMl). 



AIL STR 





Type 1 Error Rate 


Rower 


Type 1 Error Rate 


Rower 


Naive 


0.5S8** 


_a 


0.802** 


_ 


Lasts 


0.059 


0.682 


0.08S** 


_ 


Last6 


0.051 


0.674 


0.060* 


_ 


Last12 


0.058 


0.678 


0.057 


0.65S 


All Red 


0.049 


0.676 


0.05S 


0.664 


CL 


0.055 


0.89S 


0.056 


0.882 


IL 


0.099** 




0.102** 




UL 


0.21 S** 




0.241** 




CUL 


0.048 


0.805 


0.042* 


0.800 


lUL 


0.078** 




0.080** 




CUL-F 1 


0.008** 


0.559 


0.014** 


0.527 


lUL-F Lasts 


0.052 


0.741 


0.048 


0.716 


lUL-FAIIRed 


0.040* 


0.7S4 


0.052 


0.716 



Type 1 error rate and power at significance level 0.05 under different marker sets 
and variance models. AIL, advanced intercross line; STR, structured population; 
AllPed, entire pedigree; CL, complete linkage; IL, incomplete linkage; UL, 
unlinked; CUL, completely linked and unlinked; lUL, incompletely linked and 
unlinked. 

* Indicate that the estimated type I error rate is significantly different from 0.05 at 

significance levels 0.05. 
** Indicate that the estimated type I error rate is significantly different from 0.05 
at significance levels 0.01. 

Power results are not shown when the type I error rate is inflated. 

Considering Table 2 and Figure 1, the ability of VMl to control 
false-positive results was determined by the amount of pedigree in- 
formation. VM2 depended on how accurately the markers captured 
information about the polygenes. VM3 provides a robust alternative 
when neither sufficient pedigree nor marker data are available. In 
general, VMl was the least powerful, VM2 was the most powerful if 
markers linked to scanning loci were excluded in the estimation of 
relationship matrices, and VM3 was a compromise. 

Note that our reported powers wiU include positive associations 
even when the "significant" locus is far from the true QTL. The 
consequence is that the power values are greater than they would 
be under an approach that requires the association to be close to 
the true QTL. However, the relative powers of the methods will not 
be affected by the differences between these two approaches. 

Real data set 

Finally, we applied these methods to a real data set (Parker et al. 
2011). We estimated relatedness by using the full pedigree (AllPed, 
i.e.y VMl) or all markers on the genome (AUSNP, i.e., VM2). In 
addition, because including SNPs on the chromosome being scanned 
in the relatedness estimation is overly conservative (Table 2, CUL -hi), 
we estimated relatedness by using all markers except the chromosome 
being scanned for a QTL (AUSNP- 1). In Figure 2 we compare a version 
of VM3 that combines both AUSNP- 1 and pedigree information 
(AUSNP- 1 + AUPed) with either AUSNP (Figure 2A) or AUPed (Figure 
2B). The estimated heritabUity of this trait was 74.8% using VM3. 

Note that we did not compare AUSNP- 1 to AUSNP. AUSNP- 1 is 
comparable with lUL (Table 1) because lUL used SNP information 
from chromosomes 1-10 but scanned for QTL on chromosome 11. 
As shown in Table 2 and Figure 1 the abUity of lUL to control the type 
I error rate depends on marker density. In this real data example, it was 
not clear whether our markers were sufficiently dense. Therefore, any 
apparent advantage in power of AUSNP- 1 compared with AUSNP might 
be a result of a faUure of AUSNP- 1 to control the type I error rate. In 
situations in which sufficiently dense markers are available, AUSNP- 1 
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Figure 1 Estimated type I error rate and statistical power at significance level 0.05 for varying densities of markers. Marker densities on 
chromosomes 1-10 were varied, and markers on chromosomes 11-15 were either varied (left) or held fixed at 2 cM spacing (right). VM1: 
relationship matrices estimated using the entire pedigree (AllPed); VM2: relationship matrices estimated using genotypes on chromosomes 
1-10 (IL); VMS: relationship matrices estimated using both genotypes on chromosomes 1-10 (IL) and the last three generations of the pedigree 
(LastS). Red symbols indicate conditions with inflated type I error rate. 



should control the type I error rate, as shown in Figure 1. The benefit in 
power of using AUSNP-1 + AUPed is demonstrated in Figure 2, where 
this method detected five genome-wide significant results, whereas 
AUPed detected one and AUSNP zero genome-wide significant loci. 

DISCUSSION 

GWAS is a powerful tool for dissecting the genetic basis of 
quantitative traits. However, accurate inference depends on a valid 
test (i.e.y correct type I error rates), a requirement that may not be met 
if either familial relatedness or population structure is not properly 
modeled. When working with model organisms, GWAS is often per- 
formed with the use of populations in which individuals are closely 
related to one another, necessitating a method to estimate the relat- 
edness. This can be done using a pedigree, if available, but could 
potentially also be performed using observed genotype data. We found 
that estimates of relatedness that use sufficiently long pedigrees can 
control the type I error rate. Furthermore, marker-based estimates can 



also control the type I error rate if the markers are sufficiently dense to 
accurately estimate the realized relatedness at the polygenes. Perhaps 
more importantly, we find that an estimator that uses both pedigree 
information and genotype data gave consistently accurate type I error 
rates across differing levels of pedigree and genotype informativeness, 
even when using either pedigree or genotype data that alone would 
not result in a valid test. 

We also investigated how different approaches to estimating 
relatedness using marker data affect the power of a GWAS. We 
found power was increased by excluding markers that are in LD with 
the marker being tested. This finding is underscored by our analysis 
of the AIL mouse data set, where five loci reach genome- wide significance 
when this approach is used, whereas only one locus meets genome-wide 
significance when all markers are used to estimate the matrices. Note 
that both our real dataset and our simulations had relatively high 
heritabilities; however, we expect that our conclusions can be extended 
to traits with lower heritabilities. 
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Figure 2 Mapping results of the B6xD2 Fs 
methamphetamine sensitivity data (Parker 
et al. 2011). The green dashed line indicates 
the threshold for genome-wide significance 
at the 0.05 level. Red dots are the mapping 
results that estimate relatedness by combin- 
ing AIISNP-1 and pedigree information. Black 
dots are the mapping results that estimate 
relatedness using (A) AIISNP or (B) AllPed. 
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We propose that all markers on the chromosome being scanned be 
excluded from the relationship estimation. Further power improve- 
ments may be possible by excluding only those markers that are in 
LD with the locus being tested rather than all markers on that 
chromosome, though this would entail a more complicated imple- 
mentation. Current methods for efficiently using mixed models in 
GWAS (Kang et al 2008; Cheng et al 2010; Meyer and Tier 2012; 
Zhou and Stephens 2012) would need modification and may lose 
computational efficiency. Excluding all markers on the chromosome 
allowed a reasonable compromise between computational speed and 
power. We do note that the gains in power obtained by excluding 
markers in LD with the tested locus is likely most important when 
working in populations where LD extends over a significant fraction 
of the chromosome, though we do not directly assess this here. Re- 
cently, the loss of power due to inclusion of markers in LD with the 
tested locus has recently been referred to as "proximal contamination" 
by Listgarten et al (2012). 

Ideally, we would expect to obtain optimal power by not just 
excluding markers in LD with the locus being tested but by only using 
genotypes most informative of IBD sharing at the polygenic loci. Here 
we used IBS sharing as a proxy for IBD sharing, an approximation 
that is exact in the AIL and STR populations used here. In populations 
where IBS is less indicative of IBD (e.g., natural populations) recent 
advances allow for highly accurate estimates of IBD sharing given 
sufficiently dense marker data (Han and Abney 2011, 2013). We 
expect using IBD estimates obtained from such methods, rather than 
solely using pedigree based estimates of relatedness, will provide gains 



similar to what we obtained here. Our results, then, should provide 
practical guidance to researchers seeking to model polygenic variation 
in support of GWAS and related study designs. 
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